137 research outputs found
A possibilistic approach to latent structure analysis for symmetric fuzzy data.
In many situations the available amount of data is huge and can be intractable. When the data set is single valued, latent structure models are recognized techniques, which provide a useful compression of the information. This is done by considering a regression model between observed and unobserved (latent) fuzzy variables. In this paper, an extension of latent structure analysis to deal with fuzzy data is proposed. Our extension follows the possibilistic approach, widely used both in the cluster and regression frameworks. In this case, the possibilistic approach involves the formulation of a latent structure analysis for fuzzy data by optimization. Specifically, a non-linear programming problem in which the fuzziness of the model is minimized is introduced. In order to show how our model works, the results of two applications are given.Latent structure analysis, symmetric fuzzy data set, possibilistic approach.
A least squares approach to Principal Component Analysis for interval valued data
Principal Component Analysis (PCA) is a well known technique the aim of which is to synthesize huge amounts of numerical data by means of a low number of unobserved variables, called components. In this paper, an extension of PCA to deal with interval valued data is proposed. The method, called Midpoint Radius Principal Component Analysis (MR-PCA) recovers the underlying structure of interval valued data by using both the midpoints (or centers) and the radii (a measure of the interval width) information. In order to analyze how MR-PCA works, the results of a simulation study and two applications on chemical data are proposed.Principal Component Analysis, Least squares approach, Interval valued data, Chemical data
Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review
Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published âFuzzy Setsâ [335]. After only one year, the first effects of this seminal paper began to emerge, with the pioneering paper on clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of clustering algorithm based on the fuzzy sets theory
Fuzzy C-ordered medoids clustering of interval-valued data
Fuzzy clustering for interval-valued data helps us to find natural vague boundaries in such data. The
Fuzzy c-Medoids Clustering (FcMdC) method is one of the most popular clustering methods based on a
partitioning around medoids approach. However, one of the greatest disadvantages of this method is its
sensitivity to the presence of outliers in data. This paper introduces a new robust fuzzy clustering
method named Fuzzy c-Ordered-Medoids clustering for interval-valued data (FcOMdC-ID). The Huber's
M-estimators and the Yager's Ordered Weighted Averaging (OWA) operators are used in the method
proposed to make it robust to outliers. The described algorithm is compared with the fuzzy c-medoids
method in the experiments performed on synthetic data with different types of outliers. A real application of the FcOMdC-ID is also provided
A fuzzy taxonomy for e-Health projects
Evaluating the impact of Information Technology (IT) projects represents a problematic task for policy and decision makers aiming to define roadmaps based on previous experiences. Especially in the healthcare sector IT can support a wide range of processes and it is difficult to analyze in a comparative way the benefits and results of e-Health practices in order to define strategies and to assign priorities to potential investments. A first step towards the definition of an evaluation framework to compare e-Health initiatives consists in the definition of clusters of homogeneous projects that can be further analyzed through multiple case studies. However imprecision and subjectivity affect the classification of e-Health projects that are focused on multiple aspects of the complex healthcare system scenario. In this paper we apply a method, based on advanced cluster techniques and fuzzy theories, for validating a project taxonomy in the e-Health sector. An empirical test of the method has been performed over a set of European good practices in order to define a taxonomy for classifying e-Health projects.Evaluating the impact of Information Technology (IT) projects represents a problematic task for policy and decision makers aiming to define roadmaps based on previous experiences. Especially in the healthcare sector IT can support a wide range of processes and it is difficult to analyze in a comparative way the benefits and results of e-Health practices in order to define strategies and to assign priorities to potential investments. A first step towards the definition of an evaluation framework to compare e-Health initiatives consists in the definition of clusters of homogeneous projects that can be further analyzed through multiple case studies. However imprecision and subjectivity affect the classification of e-Health projects that are focused on multiple aspects of the complex healthcare system scenario. In this paper we apply a method, based on advanced cluster techniques and fuzzy theories, for validating a project taxonomy in the e-Health sector. An empirical test of the method has been performed over a set of European good practices in order to define a taxonomy for classifying e-Health projects.Articles published in or submitted to a Journal without IF refereed / of international relevanc
Quantile-Based Fuzzy Clustering of Multivariate Time Series in the Frequency Domain
Financiado para publicaciĂłn en acceso aberto: Universidade da Coruña/CISUG[Abstract] A novel procedure to perform fuzzy clustering of multivariate time series generated from different dependence models is proposed. Different amounts of dissimilarity between the generating models or changes on the dynamic behaviours over time are some arguments justifying a fuzzy approach, where each series is associated to all the clusters with specific membership levels. Our procedure considers quantile-based cross-spectral features and consists of three stages: (i) each element is characterized by a vector of proper estimates of the quantile cross-spectral densities, (ii) principal component analysis is carried out to capture the main differences reducing the effects of the noise, and (iii) the squared Euclidean distance between the first retained principal components is used to perform clustering through the standard fuzzy C-means and fuzzy C-medoids algorithms. The performance of the proposed approach is evaluated in a broad simulation study where several types of generating processes are considered, including linear, nonlinear and dynamic conditional correlation models. Assessment is done in two different ways: by directly measuring the quality of the resulting fuzzy partition and by taking into account the ability of the technique to determine the overlapping nature of series located equidistant from well-defined clusters. The procedure is compared with the few alternatives suggested in the literature, substantially outperforming all of them whatever the underlying process and the evaluation scheme. Two specific applications involving air quality and financial databases illustrate the usefulness of our approach.The authors are grateful to the anonymous referees for their comments and suggestions. The research of Ăngel LĂłpez-Oriona and JosĂ© A. Vilar has been supported by the Ministerio de EconomĂa y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de InvestigaciĂłn del Sistema Universitario de Galicia âCITICâ grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUGXunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G 2019/0
A Bayesian network to analyse basketball playersâ performances: a multivariate copula-based approach
Statistics in sports plays a key role in predicting winning strategies and providing objective performance indicators. Despite the growing interest in recent years in using statistical
methodologies in this field, less emphasis has been given to the multivariate approach. This
work aims at using the Bayesian networks to model the joint distribution of a set of indicators
of playersâ performances in basketball in order to discover the set of their probabilistic relationships as well as the main determinants affecting the playerâs winning percentage. From a
methodological point of view, the interest is to define a suitable model for non-Gaussian data,
relaxing the strong assumption on normal distribution in favour of Gaussian copula. Through
the estimated Bayesian network, we discovered many interesting dependence relationships,
providing a scientific validation of some known results mainly based on experience. At last,
some scenarios of interest have been simulated to understand the main determinants that
contribute to rising in the number of won games by a player
Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences
Financiado para publicaciĂłn en acceso aberto: Universidade da Coruña/CISUG.[Abstract]: Two novel distances between categorical time series are introduced. Both of them measure discrepancies between extracted features describing the underlying serial dependence patterns. One distance is based on well-known association measures, namely Cramer's v and Cohen's Îș. The other one relies on the so-called binarization of a categorical process, which indicates the presence of each category by means of a canonical vector. Binarization is used to construct a set of innovative association measures which allow to identify different types of serial dependence. The metrics are used to perform crisp and fuzzy clustering of nominal series. The proposed approaches are able to group together series generated from similar stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. Extensive simulation studies show that both hard and soft clustering algorithms outperform several alternative procedures proposed in the literature. Two applications involving biological sequences from different species highlight the usefulness of the introduced techniques.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C-2020-14The research of Ăngel LĂłpez-Oriona and JosĂ© A. Vilar has been supported by the Ministerio de EconomĂa y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de InvestigaciĂłn del Sistema Universitario de Galicia âCITICâ grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUG. The author Ăngel LĂłpez-Oriona is very grateful to researcher Maite Freire for her lessons about DNA theory
Fuzzy clustering of spatial interval-valued data
In this paper, two fuzzy clustering methods for spatial intervalvalued
data are proposed, i.e. the fuzzy C-Medoids clustering
of spatial interval-valued data with and without entropy regularization.
Both methods are based on the Partitioning Around
Medoids (PAM) algorithm, inheriting the great advantage of
obtaining non-fictitious representative units for each cluster.
In both methods, the units are endowed with a relation
of contiguity, represented by a symmetric binary matrix. This
can be intended both as contiguity in a physical space and as
a more abstract notion of contiguity. The performances of the
methods are proved by simulation, testing the methods with
different contiguity matrices associated to natural clusters of
units. In order to show the effectiveness of the methods in
empirical studies, three applications are presented: the clustering
of municipalities based on interval-valued pollutants levels, the
clustering of European fact-checkers based on interval-valued
data on the average number of impressions received by their
tweets and the clustering of the residential zones of the city of
Rome based on the interval of price values
- âŠ